1,224 research outputs found

    Parametric Fokker-Planck equation

    Full text link
    We derive the Fokker-Planck equation on the parametric space. It is the Wasserstein gradient flow of relative entropy on the statistical manifold. We pull back the PDE to a finite dimensional ODE on parameter space. Some analytical example and numerical examples are presented

    Adaptive introgression underlies polymorphic seasonal camouflage in snowshoe hares

    Get PDF
    Snowshoe hares (Lepus americanus) maintain seasonal camouflage by molting to a white winter coat, but some hares remain brown during the winter in regions with low snow cover. We show that cis-regulatory variation controlling seasonal expression of the Agouti gene underlies this adaptive winter camouflage polymorphism. Genetic variation at Agouti clustered by winter coat color across multiple hare and jackrabbit species, revealing a history of recurrent interspecific gene flow. Brown winter coats in snowshoe hares likely originated from an introgressed black-tailed jackrabbit allele that has swept to high frequency in mild winter environments. These discoveries show that introgression of genetic variants that underlie key ecological traits can seed past and ongoing adaptation to rapidly changing environments. (c) The Authors, Some Rights Reserved

    The Link between Dengue Incidence and El Niño Southern Oscillation

    Get PDF
    Pejman Rohani discusses a new study that examined the dynamic relationship between climate variables and dengue incidence in Thailand, Mexico, and Puerto Rico

    Beyond Volume: The Impact of Complex Healthcare Data on the Machine Learning Pipeline

    Full text link
    From medical charts to national census, healthcare has traditionally operated under a paper-based paradigm. However, the past decade has marked a long and arduous transformation bringing healthcare into the digital age. Ranging from electronic health records, to digitized imaging and laboratory reports, to public health datasets, today, healthcare now generates an incredible amount of digital information. Such a wealth of data presents an exciting opportunity for integrated machine learning solutions to address problems across multiple facets of healthcare practice and administration. Unfortunately, the ability to derive accurate and informative insights requires more than the ability to execute machine learning models. Rather, a deeper understanding of the data on which the models are run is imperative for their success. While a significant effort has been undertaken to develop models able to process the volume of data obtained during the analysis of millions of digitalized patient records, it is important to remember that volume represents only one aspect of the data. In fact, drawing on data from an increasingly diverse set of sources, healthcare data presents an incredibly complex set of attributes that must be accounted for throughout the machine learning pipeline. This chapter focuses on highlighting such challenges, and is broken down into three distinct components, each representing a phase of the pipeline. We begin with attributes of the data accounted for during preprocessing, then move to considerations during model building, and end with challenges to the interpretation of model output. For each component, we present a discussion around data as it relates to the healthcare domain and offer insight into the challenges each may impose on the efficiency of machine learning techniques.Comment: Healthcare Informatics, Machine Learning, Knowledge Discovery: 20 Pages, 1 Figur

    Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing

    Get PDF
    Biological named entity recognition, the identification of biological terms in text, is essential for biomedical information extraction. Machine learning-based approaches have been widely applied in this area. However, the recognition performance of current approaches could still be improved. Our novel approach is to combine support vector machines (SVMs) and conditional random fields (CRFs), which can complement and facilitate each other. During the hybrid process, we use SVM to separate biological terms from non-biological terms, before we use CRFs to determine the types of biological terms, which makes full use of the power of SVM as a binary-class classifier and the data-labeling capacity of CRFs. We then merge the results of SVM and CRFs. To remove any inconsistencies that might result from the merging, we develop a useful algorithm and apply two rules. To ensure biological terms with a maximum length are identified, we propose a maximal bidirectional squeezing approach that finds the longest term. We also add a positive gain to rare events to reinforce their probability and avoid bias. Our approach will also gradually extend the context so more contextual information can be included. We examined the performance of four approaches with GENIA corpus and JNLPBA04 data. The combination of SVM and CRFs improved performance. The macro-precision, macro-recall, and macro-F1 of the SVM-CRFs hybrid approach surpassed conventional SVM and CRFs. After applying the new algorithms, the macro-F1 reached 91.67% with the GENIA corpus and 84.04% with the JNLPBA04 data

    The Global Diversity of Parasitic Isopods Associated with Crustacean Hosts (Isopoda: Bopyroidea and Cryptoniscoidea)

    Get PDF
    Parasitic isopods of Bopyroidea and Cryptoniscoidea (commonly referred to as epicarideans) are unique in using crustaceans as both intermediate and definitive hosts. In total, 795 epicarideans are known, representing ∼7.7% of described isopods. The rate of description of parasitic species has not matched that of free-living isopods and this disparity will likely continue due to the more cryptic nature of these parasites. Distribution patterns of epicarideans are influenced by a combination of their definitive (both benthic and pelagic species) and intermediate (pelagic copepod) host distributions, although host specificity is poorly known for most species. Among epicarideans, nearly all species in Bopyroidea are ectoparasitic on decapod hosts. Bopyrids are the most diverse taxon (605 species), with their highest diversity in the North West Pacific (139 species), East Asian Sea (120 species), and Central Indian Ocean (44 species). The diversity patterns of Cryptoniscoidea (99 species, endoparasites of a diverse assemblage of crustacean hosts) are distinct from bopyrids, with the greatest diversity of cryptoniscoids in the North East Atlantic (18 species) followed by the Antarctic, Mediterranean, and Arctic regions (13, 12, and 8 species, respectively). Dajidae (54 species, ectoparasites of shrimp, mysids, and euphausids) exhibits highest diversity in the Antarctic (7 species) with 14 species in the Arctic and North East Atlantic regions combined. Entoniscidae (37 species, endoparasites within anomuran, brachyuran and shrimp hosts) show highest diversity in the North West Pacific (10 species) and North East Atlantic (8 species). Most epicarideans are known from relatively shallow waters, although some bopyrids are known from depths below 4000 m. Lack of parasitic groups in certain geographic areas is likely a sampling artifact and we predict that the Central Indian Ocean and East Asian Sea (in particular, the Indo-Malay-Philippines Archipelago) hold a wealth of undescribed species, reflecting our knowledge of host diversity patterns

    Incorporating rich background knowledge for gene named entity classification and recognition

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene named entity classification and recognition are crucial preliminary steps of text mining in biomedical literature. Machine learning based methods have been used in this area with great success. In most state-of-the-art systems, elaborately designed lexical features, such as words, n-grams, and morphology patterns, have played a central part. However, this type of feature tends to cause extreme sparseness in feature space. As a result, out-of-vocabulary (OOV) terms in the training data are not modeled well due to lack of information.</p> <p>Results</p> <p>We propose a general framework for gene named entity representation, called feature coupling generalization (FCG). The basic idea is to generate higher level features using term frequency and co-occurrence information of highly indicative features in huge amount of unlabeled data. We examine its performance in a named entity classification task, which is designed to remove non-gene entries in a large dictionary derived from online resources. The results show that new features generated by FCG outperform lexical features by 5.97 F-score and 10.85 for OOV terms. Also in this framework each extension yields significant improvements and the sparse lexical features can be transformed into both a lower dimensional and more informative representation. A forward maximum match method based on the refined dictionary produces an F-score of 86.2 on BioCreative 2 GM test set. Then we combined the dictionary with a conditional random field (CRF) based gene mention tagger, achieving an F-score of 89.05, which improves the performance of the CRF-based tagger by 4.46 with little impact on the efficiency of the recognition system. A demo of the NER system is available at <url>http://202.118.75.18:8080/bioner</url>.</p

    Disease and the Extended Phenotype: Parasites Control Host Performance and Survival through Induced Changes in Body Plan

    Get PDF
    BACKGROUND: By definition, parasites harm their hosts. However, some forms of parasite-induced alterations increase parasite transmission between hosts, such that manipulated hosts can be considered extensions of the parasite's phenotype. While well accepted in principle, surprisingly few studies have quantified how parasite manipulations alter host performance and survival under field and laboratory conditions. METHODOLOGY/PRINCIPAL FINDINGS: By interfering with limb development, the trematode Ribeiroia ondatrae causes particularly severe morphological alterations within amphibian hosts that provide an ideal system to evaluate parasite-induced changes in phenotype. Here, we coupled laboratory performance trials with a capture-mark-recapture study of 1388 Pacific chorus frogs (Pseudacris regilla) to quantify the effects of parasite-induced malformations on host locomotion, foraging, and survival. Malformations, which affected ~50% of metamorphosing frogs in nature, caused dramatic reductions in all measures of organismal function. Malformed frogs exhibited significantly shorter jumping distances (41% reduction), slower swimming speeds (37% reduction), reduced endurance (66% reduction), and lower foraging success relative to infected hosts without malformations. Furthermore, while normal and malformed individuals had comparable survival within predator-free exclosures, deformed frogs in natural populations had 22% lower biweekly survival than normal frogs and rarely recruited to the adult population over a two-year period. CONCLUSIONS/SIGNIFICANCE: Our results highlight the ability of parasites to deeply alter multiple dimensions of host phenotype with important consequences for performance and survival. These patterns were best explained by malformation status, rather than infection per se, helping to decouple the direct and indirect effects of parasitism on host fitness.Brett A. Goodman and Pieter T. J. Johnso
    • …
    corecore